Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize global ordinal includes/excludes for prefix matching #14371

Merged
merged 7 commits into from
Aug 20, 2024

Conversation

msfroh
Copy link
Collaborator

@msfroh msfroh commented Jun 15, 2024

Description

If an aggregration specifies includes or excludes based on a regular expression, and the regular expression has a finite expansion followed by .*, then we can optimize the global ordinal filter.

Specifically, in this case, we can expand the matching prefixes, then include/exclude the range of global ordinals that start with each prefix.

Related Issues

Resolves #14368

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 05c8e3c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 02b31a6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Aug 5, 2024

❌ Gradle check result for 0f5d528: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@msfroh msfroh force-pushed the include_exclude_prefixes branch from 0f5d528 to f6e1c96 Compare August 5, 2024 21:18
Copy link
Contributor

github-actions bot commented Aug 5, 2024

❕ Gradle check result for f6e1c96: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@harshavamsi
Copy link
Contributor

#14289 is the flaky test

msfroh added 7 commits August 19, 2024 18:10
If an aggregration specifies includes or excludes based on a regular
expression, and the regular expression has a finite expansion followed
by .*, then we can optimize the global ordinal filter.

Specifically, in this case, we can expand the matching prefixes, then
include/exclude the range of global ordinals that start with each
prefix.

Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Michael Froh <froh@amazon.com>
Updated the unit test to be functionally equivalent, but it covers
more of the regex logic.

Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Michael Froh <froh@amazon.com>
@msfroh msfroh force-pushed the include_exclude_prefixes branch from f6e1c96 to 1bc728a Compare August 20, 2024 01:17
Copy link
Contributor

❌ Gradle check result for 1bc728a: SUCCESS

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for 1bc728a: SUCCESS

@mch2 mch2 merged commit 13163ab into opensearch-project:main Aug 20, 2024
37 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 20, 2024
* Optimize global ordinal includes/excludes for prefix matching

If an aggregration specifies includes or excludes based on a regular
expression, and the regular expression has a finite expansion followed
by .*, then we can optimize the global ordinal filter.

Specifically, in this case, we can expand the matching prefixes, then
include/exclude the range of global ordinals that start with each
prefix.

Signed-off-by: Michael Froh <froh@amazon.com>

* Add unit test

Signed-off-by: Michael Froh <froh@amazon.com>

* Add changelog entry

Signed-off-by: Michael Froh <froh@amazon.com>

* Improve test coverage

Updated the unit test to be functionally equivalent, but it covers
more of the regex logic.

Signed-off-by: Michael Froh <froh@amazon.com>

* Improve test coverage

Signed-off-by: Michael Froh <froh@amazon.com>

* Fix bug in exclude-only case with no doc values in segment

Signed-off-by: Michael Froh <froh@amazon.com>

* Address comments from @mch2

Signed-off-by: Michael Froh <froh@amazon.com>

---------

Signed-off-by: Michael Froh <froh@amazon.com>
(cherry picked from commit 13163ab)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
mch2 pushed a commit that referenced this pull request Aug 20, 2024
#15324)

* Optimize global ordinal includes/excludes for prefix matching

If an aggregration specifies includes or excludes based on a regular
expression, and the regular expression has a finite expansion followed
by .*, then we can optimize the global ordinal filter.

Specifically, in this case, we can expand the matching prefixes, then
include/exclude the range of global ordinals that start with each
prefix.



* Add unit test



* Add changelog entry



* Improve test coverage

Updated the unit test to be functionally equivalent, but it covers
more of the regex logic.



* Improve test coverage



* Fix bug in exclude-only case with no doc values in segment



* Address comments from @mch2



---------


(cherry picked from commit 13163ab)

Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
wdongyu pushed a commit to wdongyu/OpenSearch that referenced this pull request Aug 22, 2024
…arch-project#14371)

* Optimize global ordinal includes/excludes for prefix matching

If an aggregration specifies includes or excludes based on a regular
expression, and the regular expression has a finite expansion followed
by .*, then we can optimize the global ordinal filter.

Specifically, in this case, we can expand the matching prefixes, then
include/exclude the range of global ordinals that start with each
prefix.

Signed-off-by: Michael Froh <froh@amazon.com>

* Add unit test

Signed-off-by: Michael Froh <froh@amazon.com>

* Add changelog entry

Signed-off-by: Michael Froh <froh@amazon.com>

* Improve test coverage

Updated the unit test to be functionally equivalent, but it covers
more of the regex logic.

Signed-off-by: Michael Froh <froh@amazon.com>

* Improve test coverage

Signed-off-by: Michael Froh <froh@amazon.com>

* Fix bug in exclude-only case with no doc values in segment

Signed-off-by: Michael Froh <froh@amazon.com>

* Address comments from @mch2

Signed-off-by: Michael Froh <froh@amazon.com>

---------

Signed-off-by: Michael Froh <froh@amazon.com>
shiv0408 added a commit to shiv0408/OpenSearch that referenced this pull request Sep 2, 2024
* Optimize global ordinal includes/excludes for prefix matching (opensearch-project#14371)

* Optimize global ordinal includes/excludes for prefix matching

If an aggregration specifies includes or excludes based on a regular
expression, and the regular expression has a finite expansion followed
by .*, then we can optimize the global ordinal filter.

Specifically, in this case, we can expand the matching prefixes, then
include/exclude the range of global ordinals that start with each
prefix.

Signed-off-by: Michael Froh <froh@amazon.com>

* Add unit test

Signed-off-by: Michael Froh <froh@amazon.com>

* Add changelog entry

Signed-off-by: Michael Froh <froh@amazon.com>

* Improve test coverage

Updated the unit test to be functionally equivalent, but it covers
more of the regex logic.

Signed-off-by: Michael Froh <froh@amazon.com>

* Improve test coverage

Signed-off-by: Michael Froh <froh@amazon.com>

* Fix bug in exclude-only case with no doc values in segment

Signed-off-by: Michael Froh <froh@amazon.com>

* Address comments from @mch2

Signed-off-by: Michael Froh <froh@amazon.com>

---------

Signed-off-by: Michael Froh <froh@amazon.com>

* Adding access to noSubMatches and noOverlappingMatches in Hyphenation… (opensearch-project#13895)

* Adding access to noSubMatches and noOverlappingMatches in HyphenationCompoundWordTokenFilter

Signed-off-by: Evan Kielley <evankielley@gmail.com>

* Add Changelog Entry

Signed-off-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>

* test: add hyphenation decompounder tests

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* test: refactor tests

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* test: reformat test files

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: add changelog entry for 2.X

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: remove 3.x changelog

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: linting

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

---------

Signed-off-by: Evan Kielley <evankielley@gmail.com>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>
Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>
Co-authored-by: Evan Kielley <evankielley@gmail.com>

* Add Settings related to Workload Management feature (opensearch-project#15028)

* add QeryGroup Service tests
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* add PR to changelog
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* change the test directory
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* modify comments to be more specific
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* add test coverage
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* remove QUERY_GROUP_RUN_INTERVAL_SETTING as we'll define it in QueryGroupService
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* address comments
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* Update affiliation for @nknize. (opensearch-project#15322)

Signed-off-by: dblock <dblock@amazon.com>

* Add log when download completes with file size (opensearch-project#15224)

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>

* Support Filtering on Large List encoded by Bitmap (version update) (opensearch-project#15352)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Add support for index level slice count setting (opensearch-project#15336)

Signed-off-by: Ganesh Ramadurai <gramadur@amazon.com>

* Adding allowlist setting for ingest-useragent and ingest-geoip processors (opensearch-project#15325)

* Adding allowlist setting for user-agent, geo-ip and updated tests for ingest-common.

Signed-off-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>

* Remove duplicate test in ingest-common

Signed-off-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>

* Adding changelog

Signed-off-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>

---------

Signed-off-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>

* Add Delete QueryGroup API Logic (opensearch-project#14735)

* Add Delete QueryGroup API Logic
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* modify changelog
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* include comments from create pr
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* remove delete all
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* rebase and address comments
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* rebase
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* address comments
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* address comments
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* address comments
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* add UT coverage
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* [Star Tree] Lucene Abstractions for Star Tree File Formats  (opensearch-project#15278)

---------
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

* [Star tree] Changes to handle derived metrics such as avg as part of star tree mapping (opensearch-project#15152)

---------
Signed-off-by: Bharathwaj G <bharath78910@gmail.com>

* relaxing the join validation for nodes which have only store disabled but only publication enabled

* relaxing the join validation for nodes which have only store disabled but only publication enabled

Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>

---------

Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Evan Kielley <evankielley@gmail.com>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>
Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>
Signed-off-by: dblock <dblock@amazon.com>
Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Signed-off-by: Ganesh Ramadurai <gramadur@amazon.com>
Signed-off-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>
Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
Co-authored-by: Michael Froh <froh@amazon.com>
Co-authored-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>
Co-authored-by: Evan Kielley <evankielley@gmail.com>
Co-authored-by: Ruirui Zhang <mariazrr@amazon.com>
Co-authored-by: Daniel (dB.) Doubrovkine <dblock@amazon.com>
Co-authored-by: Gaurav Bafna <85113518+gbbafna@users.noreply.github.com>
Co-authored-by: Andriy Redko <andriy.redko@aiven.io>
Co-authored-by: Ganesh Krishna Ramadurai <gramadur@icloud.com>
Co-authored-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>
Co-authored-by: Sarthak Aggarwal <sarthagg@amazon.com>
Co-authored-by: Bharathwaj G <bharath78910@gmail.com>
Co-authored-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
akolarkunnu pushed a commit to akolarkunnu/OpenSearch that referenced this pull request Sep 10, 2024
…arch-project#14371)

* Optimize global ordinal includes/excludes for prefix matching

If an aggregration specifies includes or excludes based on a regular
expression, and the regular expression has a finite expansion followed
by .*, then we can optimize the global ordinal filter.

Specifically, in this case, we can expand the matching prefixes, then
include/exclude the range of global ordinals that start with each
prefix.

Signed-off-by: Michael Froh <froh@amazon.com>

* Add unit test

Signed-off-by: Michael Froh <froh@amazon.com>

* Add changelog entry

Signed-off-by: Michael Froh <froh@amazon.com>

* Improve test coverage

Updated the unit test to be functionally equivalent, but it covers
more of the regex logic.

Signed-off-by: Michael Froh <froh@amazon.com>

* Improve test coverage

Signed-off-by: Michael Froh <froh@amazon.com>

* Fix bug in exclude-only case with no doc values in segment

Signed-off-by: Michael Froh <froh@amazon.com>

* Address comments from @mch2

Signed-off-by: Michael Froh <froh@amazon.com>

---------

Signed-off-by: Michael Froh <froh@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch enhancement Enhancement or improvement to existing feature or request Search:Aggregations v2.16.0 Issues and PRs related to version 2.16.0 v2.17.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Aggregation include/exclude should support faster filtering on prefixes
4 participants